Multiword noun compound bracketing using Wikipedia
نویسندگان
چکیده
This research suggests two contributions in relation to the multiword noun compound bracketing problem: first, demonstrate the usefulness of Wikipedia for the task, and second, present a novel bracketing method relying on a word association model. The intent of the association model is to represent combined evidence about the possibly lexical, relational or coordinate nature of links between all pairs of words within a compound. As for Wikipedia, it is promoted for its encyclopedic nature, meaning it describes terms and named entities, as well as for its size, large enough for corpus-based statistical analysis. Both types of information will be used in measuring evidence about lexical units, noun relations and noun coordinates in order to feed the association model in the bracketing algorithm. Using a gold standard of around 4800 multiword noun compounds, we show performances of 73% in a strict match evaluation, comparing favourably to results reported in the literature using unsupervised approaches.
منابع مشابه
A System for Compound Noun Multiword Expression Extraction for Hindi
Compound noun multiword expressions are important for many NLP applications like machine translation and information retrieval. This paper describes a system for Hindi compound noun multiword expressions (MWE) extraction from a given corpus. We identify major categories of compound noun MWEs, based on linguistic and psycholinguistic principles. Our extraction methods use various statistical co-...
متن کاملA Dataset for Joint Noun-Noun Compound Bracketing and Interpretation
We present a new, sizeable dataset of noun– noun compounds with their syntactic analysis (bracketing) and semantic relations. Derived from several established linguistic resources, such as the Penn Treebank, our dataset enables experimenting with new approaches towards a holistic analysis of noun–noun compounds, such as jointlearning of noun–noun compounds bracketing and interpretation, as well...
متن کاملScaling Up BioNLP: Application of a Text Annotation Architecture to Noun Compound Bracketing
We describe the use of the Layered Query Language and architecture to acquire statistics for natural language processing applications. We illustrate system’s use on the problem of noun compound bracketing using MEDLINE.
متن کاملSearch Engine Statistics Beyond the n-Gram: Application to Noun Compound Bracketing
In order to achieve the long-range goal of semantic interpretation of noun compounds, it is often necessary to £rst determine their syntactic structure. This paper describes an unsupervised method for noun compound bracketing which extracts statistics from Web search engines using a χ measure, a new set of surface features, and paraphrases. On a gold standard, the system achieves results of 89....
متن کاملLinked Open Data and Web Corpus Data for noun compound bracketing
This research provides a comparison of a linked open data resource (DBpedia) and web corpus data resources (Google Web Ngrams and Google Books Ngrams) for noun compound bracketing. Large corpus statistical analysis has often been used for noun compound bracketing, and our goal is to introduce a linked open data (LOD) resource for such task. We show its particularities and its performance on the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014